Trident: A Scalable Architecture for Scalar, Vector, and Matrix Operations

نویسندگان

  • Mostafa I. Soliman
  • Stanislav G. Sedukhin
چکیده

Within a few years it will be possible to integrate a billion transistors on a single chip. At this integration level, we propose using a high level ISA to express parallelism to hardware instead of using a huge transistor budget to dynamically extract it. Since the fundamental data structures for a wide variety of applications are scalar, vector, and matrix, our proposed Trident processor extends the classical vector ISA with matrix operations. The Trident processor consists of a set of parallel vector pipelines (PVPs) combined with a fast in order scalar core. The PVPs can access both vector and matrix register files to perform vector, matrix, and matrix-vector operations. One key point of our design is the exploitation of up to three levels of data parallelism. Another key point is the ring register files for storing vector and matrix data. The ring structure of the register files reduces the number and size of the address decoders, the number of ports, the area overhead caused by the address bus, and the number of registers attached to bit lines, as well as providing local communication between PVPs. The scalability of the Trident processor does not require more fetch, decode, or issue bandwidth, but requires replication of PVPs and increasing the register file size. Scientific, engineering, multimedia, and many other applications, which are based on a mixture of scalar, vector, and matrix operations, can be speeded up on the Trident processor.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Matrix Bidiagonalization on the Trident Processor

This paper discusses the implementation and evaluation of the reduction of a dense matrix to bidiagonal form on the Trident processor. The standard Golub and Kahan Householder bidiagonalization algorithm, which is rich in matrix-vector operations, and the LAPACK subroutine _GEBRD, which is rich in a mixture of vector, matrix-vector, and matrix operations, are simulated on the Trident processor....

متن کامل

BLAS on the Trident Processor: Implementation and Performance Evaluation

This paper describes the implementation of the Basic Linear Algebra Subprograms (BLAS), which are widely used in many applications, on the Trident processor. We show how to use the Trident parallel execution units, ring, and communication registers to effectively perform vector-vector, matrix-vector, and matrix-matrix operations needed for implementing BLAS. The TFLOPS rate on infinite-size pro...

متن کامل

Dynamic configuration and collaborative scheduling in supply chains based on scalable multi-agent architecture

Due to diversified and frequently changing demands from customers, technological advances and global competition, manufacturers rely on collaboration with their business partners to share costs, risks and expertise. How to take advantage of advancement of technologies to effectively support operations and create competitive advantage is critical for manufacturers to survive. To respond to these...

متن کامل

Practical Implementation of Scalar and Vector Control Methods on a Rotor Surface Type Permanent Magnent Synchronous Machine Drive/System Using a PC

In this paper, using a personal computer (PC), the practical implementation of scalar and vector control methods on a three–phase rotor surface- type permanent magnet synchronous machine drive is discussed. Based on the machine dynamic equations and the above control strategies, two block diagrams are presented first for closed-loop speed controlling of the machine drive/system. Then, the desig...

متن کامل

Practical Implementation of Scalar and Vector Control Methods on a Rotor Surface Type Permanent Magnent Synchronous Machine Drive/System Using a PC

In this paper, using a personal computer (PC), the practical implementation of scalar and vector control methods on a three–phase rotor surface- type permanent magnet synchronous machine drive is discussed. Based on the machine dynamic equations and the above control strategies, two block diagrams are presented first for closed-loop speed controlling of the machine drive/system. Then, the desig...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002